NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CellCover Defines Marker Gene Panels Capturing Developmental Progression in Neocortical Neural Stem Cell Identity

https://doi.org/10.7554/eLife.107531.1

Ji, Lanlan; Wang, An; Sonthalia, Shreyash; Seo, Seungmae; Naiman, Daniel Q; Younes, Laurent; Colantuoni, Carlo; Geman, Donald (October 2025, eLife)

Definition of cell classes across the tissues of living organisms is central in the analysis of growing atlases of single-cell RNA sequencing (scRNA-seq) data across biomedicine. Marker genes for cell classes are most often defined by differential expression (DE) methods that serially assess individual genes across landscapes of diverse cells. This serial approach has been extremely useful, but is limited because it ignores possible redundancy or complementarity across genes that can only be captured by analyzing multiple genes simultaneously. Interrogating binarized expression data, we aim to identify discriminating panels of genes that are specific to, not only enriched in, individual cell types. To efficiently explore the vast space of possible marker panels, leverage the large number of cells often sequenced, and overcome zero-inflation in scRNA-seq data, we propose viewing marker gene panel selection as a variation of the “minimal set-covering problem” in combinatorial optimization. Using scRNA-seq data from blood and brain tissue, we show that this new method, CellCover, performs as good or better than DE and other methods in defining cell-type discriminating gene panels, while reducing gene redundancy and capturing cell-class-specific signals that are distinct from those defined by DE methods. Transfer learning experiments across mouse, primate, and human data demonstrate that CellCover identifies markers of conserved cell classes in neocortical neurogenesis, as well as developmental progression in both progenitors and neurons. Exploring markers of human outer radial glia (oRG, or basal RG) across mammals, we show that transcriptomic elements of this key cell type in the expansion of the human cortex likely appeared in gliogenic precursors of the rodent before the full program emerged in neurogenic cells of the primate lineage. We have assembled the public datasets we use in this report within the NeMO Analytics multi-omic data exploration environment [1], where the expression of individual genes (NeMO: Individual genes in cortex and NeMO: Individual genes in blood) and marker gene panels (NeMO: Telley 3 CellCover Panels, NeMO: Telley 12 CellCover Panels, NeMO: Sorted Brain Cell CellCover Panels, and NeMO: Blood 34 CellCover Panels) can be freely explored without coding expertise. CellCover is available in CellCover R and CellCover Python.
more » « less
Free, publicly-accessible full text available October 21, 2026
projectR: an R/Bioconductor package for transfer learning via PCA, NMF, correlation and clustering

https://doi.org/10.1093/bioinformatics/btaa183

Sharma, Gaurav; Colantuoni, Carlo; Goff, Loyal A; Fertig, Elana J; Stein-O’Brien, Genevieve (March 2020, Bioinformatics)
Valencia, Alfonso (Ed.)
Abstract Motivation Dimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically importent in analysis of large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset. Results We developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis. Availability and implementation projectR is available on Bioconductor and at https://github.com/genesofeve/projectR. Contact gsteinobrien@jhmi.edu or ejfertig@jhmi.edu Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Plasma microRNAs are associated with domain-specific cognitive function in people with HIV

https://doi.org/10.1097/QAD.0000000000002966

Aparicio, Julissa Massanett; Xu, Yanxun; Li, Yuliang; Colantuoni, Carlo; Dastgheyb, Raha; Williams, Dionna W.; Asahchop, Eugene L.; McMillian, Jacqueline M.; Power, Christopher; Fujiwara, Esther; et al (January 2021, AIDS)
null (Ed.)
Full Text Available
Decomposing Cell Identity for Transfer Learning across Cellular Measurements, Platforms, Tissues, and Species

https://doi.org/10.1016/j.cels.2019.04.004

Stein-O’Brien, Genevieve L.; Clark, Brian S.; Sherman, Thomas; Zibetti, Cristina; Hu, Qiwen; Sealfon, Rachel; Liu, Sheng; Qian, Jiang; Colantuoni, Carlo; Blackshaw, Seth; et al (May 2019, Cell Systems)

Full Text Available

Search for: All records